Low Latency Fault Tolerance System

نویسندگان

  • Wenbing Zhao
  • P. M. Melliar-Smith
  • Louise E. Moser
چکیده

The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed applications within a local-area network, using a leader-follower replication strategy. LLFT provides application-transparent replication, with strong replica consistency, for applications that involve multiple interacting processes or threads. Its novel system model enables LLFT to maintain a single consistent infinite computation, despite faults and asynchronous communication. The LLFT Messaging Protocol provides reliable, totally-ordered message delivery by employing a group multicast, where the message ordering is determined by the primary replica in the destination group. The Leader-Determined Membership Protocol provides reconfiguration and recovery when a replica becomes faulty and when a replica joins or leaves a group, where the membership of the group is determined by the primary replica. The Virtual Determinizer Framework captures the ordering information at the primary replica and enforces the same ordering of non-deterministic operations at the backup replicas. LLFT does not employ a majority-based, multiple-round consensus algorithm and, thus, it can operate in the common industrial case where there is a primary replica and only one backup replica. The LLFT system achieves low latency message delivery during normal operation and low latency reconfiguration and recovery when a fault occurs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CAFT: Cost-aware and Fault-tolerant routing algorithm in 2D mesh Network-on-Chip

By increasing, the complexity of chips and the need to integrating more components into a chip has made network –on- chip known as an important infrastructure for network communications on the system, and is a good alternative to traditional ways and using the bus. By increasing the density of chips, the possibility of failure in the chip network increases and providing correction and fault tol...

متن کامل

The Low Latency Fault Tolerance System

The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed applications, using the leader-follower replication technique. The LLFT system provides application-transparent replication, with strong replica consistency, for applications that involve multiple interacting processes or threads. The LLFT system comprises a Low Latency Messaging Protocol, a Leader-Determined...

متن کامل

JetScope: Reliable and Interactive Analytics at Cloud Scale

Interactive, reliable, and rich data analytics at cloud scale is a key capability to support low latency data exploration and experimentation over terabytes of data for a wide range of business scenarios. Besides the challenges in massive scalability and low latency distributed query processing, it is imperative to achieve all these requirements with effective fault tolerance and efficient reco...

متن کامل

A Low-Latency DMR Architecture with Efficient Recovering Scheme Exploiting Simultaneously Copiable SRAM

This paper presents a novel architecture for a fault-tolerant high-performance system using a checkpoint/restart approach with dual modular redundancy (DMR). The proposed architecture can perform low-latency copy with instantaneously copiable SRAM. Furthermore, we can use an instantaneous comparison scheme that has more fault coverage than comparison with a cyclic redundancy check (CRC). Evalua...

متن کامل

Somersault Software Fault-Tolerance

software fault-tolerance, process replication failure masking, continuous availability, topology The ambition of fault-tolerant systems is to provide application transparent fault-tolerance at the same performance as a non-fault-tolerant system. Somersault is a library for developing distributed fault-tolerant software systems that comes close to achieving both goals. We describe Somersault and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. J.

دوره 56  شماره 

صفحات  -

تاریخ انتشار 2013